Determining the number of clusters in a data set について

Words near each other

・ Detmold
・ Determinate (song)
・ Determinate cultivar
・ Determination
・ Determination (God Forbid album)
・ Determination (Tommy Emmanuel album)
・ Determination of equilibrium constants
・ Determination of the day of the week
・ Determinative
・ Determinatum
・ Determine
・ Determined (song)
・ Determiner
・ Determiner phrase
・ Determiner spreading
・ Determining the number of clusters in a data set
・ Determinism
・ Determinism (disambiguation)
・ Deterministic acyclic finite state automaton
・ Deterministic algorithm
・ Deterministic automaton
・ Deterministic context-free grammar
・ Deterministic context-free language
・ Deterministic encryption
・ Deterministic finite automaton
・ Deterministic garbage collector
・ Deterministic global optimization
・ Deterministic memory
・ Deterministic noise
・ Deterministic Parallel Java

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Determining the number of clusters in a data set ：ウィキペディア英語版

Determining the number of clusters in a data set
Determining the number of clusters in a data set, a quantity often labeled ''k'' as in the ''k''-means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actually solving the clustering problem.
For a certain class of clustering algorithms (in particular ''k''-means, ''k''-medoids and expectation–maximization algorithm), there is a parameter commonly referred to as ''k'' that specifies the number of clusters to detect. Other algorithms such as DBSCAN and OPTICS algorithm do not require the specification of this parameter; hierarchical clustering avoids the problem altogether.
The correct choice of ''k'' is often ambiguous, with interpretations depending on the shape and scale of the distribution of points in a data set and the desired clustering resolution of the user. In addition, increasing ''k'' without penalty will always reduce the amount of error in the resulting clustering, to the extreme case of zero error if each data point is considered its own cluster (i.e., when ''k'' equals the number of data points, ''n''). Intuitively then, ''the optimal choice of ''k'' will strike a balance between maximum compression of the data using a single cluster, and maximum accuracy by assigning each data point to its own cluster''. If an appropriate value of ''k'' is not apparent from prior knowledge of the properties of the data set, it must be chosen somehow. There are several categories of methods for making this decision.
== Rule of thumb ==
One simple rule of thumb sets the number to
:

k \approx \sqrt

with ''n'' as the number of objects (data points).

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Determining the number of clusters in a data set」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース